Working with data and writing a report using R

This report will summarize what we have done for this group assignment. There are 2 separate chapters that we use to group our work :

  1. Exploratory Analysis.

  2. Statistical Analysis based on Rodent.

Identify potential variables of interest:

##    record_id         month             day            year     
##  Min.   :    1   Min.   : 1.000   Min.   : 1.0   Min.   :1977  
##  1st Qu.: 8964   1st Qu.: 4.000   1st Qu.: 9.0   1st Qu.:1984  
##  Median :17762   Median : 6.000   Median :16.0   Median :1990  
##  Mean   :17804   Mean   : 6.474   Mean   :16.1   Mean   :1990  
##  3rd Qu.:26655   3rd Qu.:10.000   3rd Qu.:23.0   3rd Qu.:1997  
##  Max.   :35548   Max.   :12.000   Max.   :31.0   Max.   :2002  
##                                                                
##     plot_id       species_id            sex            hindfoot_length
##  Min.   : 1.00   Length:34786       Length:34786       Min.   : 2.00  
##  1st Qu.: 5.00   Class :character   Class :character   1st Qu.:21.00  
##  Median :11.00   Mode  :character   Mode  :character   Median :32.00  
##  Mean   :11.34                                         Mean   :29.29  
##  3rd Qu.:17.00                                         3rd Qu.:36.00  
##  Max.   :24.00                                         Max.   :70.00  
##                                                        NA's   :3348   
##      weight          genus             species              taxa          
##  Min.   :  4.00   Length:34786       Length:34786       Length:34786      
##  1st Qu.: 20.00   Class :character   Class :character   Class :character  
##  Median : 37.00   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 42.67                                                           
##  3rd Qu.: 48.00                                                           
##  Max.   :280.00                                                           
##  NA's   :2503                                                             
##   plot_type        
##  Length:34786      
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 

Chapter 1. Exploratory Analysis

Figure 1.1 | Taxa distribution of the survey data. Rodent is the major taxon, which contributed 98.50% of the overall taxa distribution. Bird, rabbit and reptile are the minority, which contributed approximately 1.29%, 0.22% and 0.04% respectively.

Figure 1.2 | Species distribution of individual taxon. Bird has a total of 11 species, with Amphispiza bilineata being the majority (67.3%), while rabbit only has one, Sylvilagus audubonii. Reptile has seven species in total, with Sceloporus undulatus (35.7%) and Lizard sp. (28.6%) being the two major species. Rodent has the most abundant species distribution, a total of 29 species, with Dipodomys merriami contributed 30.9% of them.

Figure 1.3 | Taxa observations spanning 26 years (1977 to 2002). Rodent shows consistent observations every year, 1977 being the peak of observation. Bird and rabbit have the similar pattern, but missing information for a couple of years. Reptile was only observed for eight intermittent years.

Figure 1.4 | Species average weight distribution from the year 1977 to 2002. Only rodent has the complete information on average weight, hence the species IDs herein refer to the species from rodent taxon. Most species have low average weight, i.e. below 50g and not every species has consistent distribution from the year 1977 to 2002. NL has the maximum of average weight among others and shows consistent distribution throughout the years.

Chapter 2. Statistical Analysis based on Rodent Data

Based on preliminary investigations, we determined that Rodent was the only taxon for which hindfoot length or sex were recorded. Because of this, any graphs including information about hindfoot length or sex will be strictly for rodent species in the study. The number of rodents in the study is below.

Figure 2.1 | The pie chart shows the distribution of 29 rodent species. Dipodomys merriami is the most abundant species, contributing 30.90% of the total. The second abundant species is Chaetodipus penicillatus, 9.12%, followed by Dipodomys ordii, 8.84%. The three least abundant species are Chaetodipus sp., Reithrodontomys sp. and Spermophilus tereticaudus.

Figure 2.2 | The bar plot shows gender distribution of rodents. There are 17348 of male and 15690 of female. Male has a higher count of 1658 than female.

##   species_id hfl_mean   hfl_sd
## 1         BA 13.00000 1.718879
## 2         DM 35.98283 1.464788
## 3         DO 35.60714 1.665163
## 4         DS 49.94880 2.084383
## 5         NL 32.25746 1.791916
## 6         OL 20.53377 1.434295

Figure 2.3 | The bar plot shows average hindfoot length by rodent species ID. DS has the longest average hindfoot length, i.e. (50)cm, whereas BA has the shortest, i.e. (13)cm. Most of the species are within the range of 18-22cm.

This plot suggests hindfoot length (cm) is different according to species ID. We conducted several tests to determine if this was the case. We began by conducting a Bartlett Test of Homogeneity of Variances to evaluate the homogeneity of variance assumption of ANOVA.

## 
##  Bartlett test of homogeneity of variances
## 
## data:  hindfoot_length by species_id
## Bartlett's K-squared = 2081.4, df = 22, p-value < 2.2e-16

Because the p-value (0) is less than the alpha of 0.05, we reject the null hypothesis that variances of the levels of species ID are equal. Because of this, we know our data does not meet the assumptions of the ANOVA and a non-parametric alternative must be used. In this case, we used a Kruskal-Wallis rank sum test.

## 
##  Kruskal-Wallis rank sum test
## 
## data:  hindfoot_length by as.factor(species_id)
## Kruskal-Wallis chi-squared = 28681, df = 22, p-value < 2.2e-16

Because the p-value (0) is less than the alpha of 0.05, we reject the null hypothesis that all species have the same average hindfoot lengths. However, this test did not inform us as to which species are different from which others. In order to determine this, we conducted a non-parametric Pairwise Wilcoxon rank sum test. In the output below, pairs with values <0.05 are significantly different from each other and we reject the null that the group means are the same.

## 
##  Pairwise comparisons using Wilcoxon rank sum test 
## 
## data:  surveys_hfoot_id$hindfoot_length and as.factor(surveys_hfoot_id$species_id) 
## 
##    BA      DM      DO      DS      NL      OL      OT      OX      PB     
## DM < 2e-16 -       -       -       -       -       -       -       -      
## DO < 2e-16 < 2e-16 -       -       -       -       -       -       -      
## DS < 2e-16 < 2e-16 < 2e-16 -       -       -       -       -       -      
## NL < 2e-16 < 2e-16 < 2e-16 < 2e-16 -       -       -       -       -      
## OL < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 -       -       -       -      
## OT < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 5.5e-09 -       -       -      
## OX 0.00474 3.9e-05 4.3e-05 6.1e-05 5.2e-05 1.00000 1.00000 -       -      
## PB < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 1.9e-05 -      
## PE < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 1.4e-08 1.00000 1.00000 < 2e-16
## PF < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.00588 < 2e-16
## PH 9.3e-12 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.00085 1.00000
## PI 0.00028 3.9e-05 4.3e-05 6.1e-05 6.0e-05 0.00041 6.2e-05 0.03353 2.3e-05
## PL 2.6e-11 < 2e-16 < 2e-16 < 2e-16 < 2e-16 1.00000 1.00000 1.00000 < 2e-16
## PM < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.35600 0.01793 1.00000 < 2e-16
## PP < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.00289 < 2e-16
## RF < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.02961 < 2e-16
## RM < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.00531 < 2e-16
## RO 0.00474 3.9e-05 4.3e-05 6.1e-05 5.1e-05 4.0e-05 2.4e-05 0.47563 1.9e-05
## RX 0.48931 0.46181 0.46746 0.48117 0.47563 1.00000 1.00000 1.00000 0.38031
## SF 2.2e-13 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.00085 0.74977
## SH < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.00016 < 2e-16
## SO 1.1e-13 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 0.00362 1.00000
##    PE      PF      PH      PI      PL      PM      PP      RF      RM     
## DM -       -       -       -       -       -       -       -       -      
## DO -       -       -       -       -       -       -       -       -      
## DS -       -       -       -       -       -       -       -       -      
## NL -       -       -       -       -       -       -       -       -      
## OL -       -       -       -       -       -       -       -       -      
## OT -       -       -       -       -       -       -       -       -      
## OX -       -       -       -       -       -       -       -       -      
## PB -       -       -       -       -       -       -       -       -      
## PE -       -       -       -       -       -       -       -       -      
## PF < 2e-16 -       -       -       -       -       -       -       -      
## PH < 2e-16 < 2e-16 -       -       -       -       -       -       -      
## PI 7.7e-05 1.9e-05 0.00336 -       -       -       -       -       -      
## PL 1.00000 < 2e-16 2.1e-10 0.00079 -       -       -       -       -      
## PM 0.00718 < 2e-16 < 2e-16 0.00011 1.00000 -       -       -       -      
## PP < 2e-16 < 2e-16 < 2e-16 0.74977 9.6e-12 < 2e-16 -       -       -      
## RF < 2e-16 < 2e-16 1.5e-14 6.9e-05 6.3e-13 < 2e-16 < 2e-16 -       -      
## RM < 2e-16 < 2e-16 < 2e-16 1.8e-05 < 2e-16 < 2e-16 < 2e-16 < 2e-16 -      
## RO 3.6e-05 1.00000 0.00085 0.03353 0.00132 2.6e-05 3.2e-05 0.00207 0.35077
## RX 1.00000 0.51958 0.56697 0.95423 1.00000 1.00000 0.56517 1.00000 1.00000
## SF < 2e-16 < 2e-16 1.00000 0.01315 4.4e-11 < 2e-16 < 2e-16 < 2e-16 < 2e-16
## SH < 2e-16 < 2e-16 3.1e-07 0.00046 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16
## SO < 2e-16 < 2e-16 1.00000 0.04950 3.4e-09 < 2e-16 < 2e-16 2.8e-15 < 2e-16
##    RO      RX      SF      SH     
## DM -       -       -       -      
## DO -       -       -       -      
## DS -       -       -       -      
## NL -       -       -       -      
## OL -       -       -       -      
## OT -       -       -       -      
## OX -       -       -       -      
## PB -       -       -       -      
## PE -       -       -       -      
## PF -       -       -       -      
## PH -       -       -       -      
## PI -       -       -       -      
## PL -       -       -       -      
## PM -       -       -       -      
## PP -       -       -       -      
## RF -       -       -       -      
## RM -       -       -       -      
## RO -       -       -       -      
## RX 1.00000 -       -       -      
## SF 0.00060 0.56697 -       -      
## SH 0.00013 0.51958 0.00101 -      
## SO 0.00091 0.78274 1.00000 1.7e-06
## 
## P value adjustment method: holm

Figure 2.4 | The bar plot shows average hindfoot length by plot type. Control and Spectab exclosure contain similar average rodent hindfoot length information.

This plot suggests hindfoot length (cm) is different according to plot type. We used the same techniques as above to determine if this was the case. We began by conducting a Bartlett Test of Homogeneity of Variances to evaluate the homogeneity of variance assumption of ANOVA.

## 
##  Bartlett test of homogeneity of variances
## 
## data:  hindfoot_length by plot_type
## Bartlett's K-squared = 1069.6, df = 4, p-value < 2.2e-16

Because the p-value (2.930892210^{-230}) is less than the alpha of 0.05, we reject the null hypothesis that variances of the levels of species ID are equal. Because of this, we know our data does not meet the assumptions of the ANOVA and a non-parametric alternative must be used. As we did above, we used a Kruskal-Wallis rank sum test.

## 
##  Kruskal-Wallis rank sum test
## 
## data:  hindfoot_length by as.factor(plot_type)
## Kruskal-Wallis chi-squared = 5808.5, df = 4, p-value < 2.2e-16

Because the p-value (0) is less than the alpha of 0.05, we reject the null hypothesis that all plot types support rodents with the same hindfoot lengths. However, this test did not inform us as to which plot types are different from which others. In order to determine this, we conducted a non-parametric Pairwise Wilcoxon rank sum test. In the output below, all pairs are significantly different from each other because all of the values are <0.05 and we reject the null that the group means are the same.

## 
##  Pairwise comparisons using Wilcoxon rank sum test 
## 
## data:  surveys_hfoot_plottype$hindfoot_length and as.factor(surveys_hfoot_plottype$plot_type) 
## 
##                           Control Long-term Krat Exclosure
## Long-term Krat Exclosure  <2e-16  -                       
## Rodent Exclosure          <2e-16  1e-05                   
## Short-term Krat Exclosure <2e-16  <2e-16                  
## Spectab exclosure         <2e-16  <2e-16                  
##                           Rodent Exclosure Short-term Krat Exclosure
## Long-term Krat Exclosure  -                -                        
## Rodent Exclosure          -                -                        
## Short-term Krat Exclosure <2e-16           -                        
## Spectab exclosure         <2e-16           <2e-16                   
## 
## P value adjustment method: holm

Figure 2.5 | The stacked bar plot on the left panel showed the total of rodent species captured per year. DM has the highest count from the year 1977 to 1999. The amount of PM overtook starting the year 2000 to 2002. The right panel showed gender distribution in terms of species IDs. Overall, most species have a even female and male distribution except OX, PI and RX only have male. PX has one female and male, however, the number is insignificant to be shown on the plot.

Figure 2.6 | Weight density distribution by gender for five different plot types. Most plot type have the density skewed to left, showing the weight distribution of both gender is between 0-100g. Spectab exclosure plot type, however, has male outweighed female. From Control, Rodent Exclosure and Short-term Krat Exclosure plot types, although female has lower density but they are more heavier than the male.

Figure 2.7 | Hindfoot length density distribution by gender for five different plot types. Similar pattern of density distribution for Control and Rodent Exclosure was observed. Female and male are overlapped in both plots. Spectab exclosure has the male more densely distributed for hindfoot length within the range of 30 - 40 cm.

Figure 2.8 | Species density distribution by year for five different plot types. Most plot types have species concentrated within the range of year 1985 and 1995. The density of species distribution of Spectab exclosure tends to skewed towards right.

Figure 2.9 | Correlation of hindfoot length and weight by species IDs.

This suggests there is a relationship between hindfoot length (cm) and weight (g) in rodents. We created a linear model to determine if there is indeed a relationship between these variables.

## 
## Call:
## lm(formula = hindfoot_length ~ weight, data = rodent_complete)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -40.400  -5.584  -0.509   6.125  36.028 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 21.572622   0.061245   352.2   <2e-16 ***
## weight       0.182831   0.001115   164.0   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.964 on 30674 degrees of freedom
## Multiple R-squared:  0.4673, Adjusted R-squared:  0.4673 
## F-statistic: 2.69e+04 on 1 and 30674 DF,  p-value: < 2.2e-16

Because the p-value is less than the alpha of 0.05, we reject the null hypothesis that the slope of the linear regression model does not differ significantly from zero. In addition, the multiple R-squared value is used to describe how well a given model explains variation in the data. In this case, this model explains 46.73% of the variation in the data.

Figure 2.10 | Rodent weight distribution (g) for female and male according to species IDs.

This suggests there is a relationship between sex and weight (g) in rodents. We conducted a Student’s t-test to compare the weights of males to that of females.

## 
##  Welch Two Sample t-test
## 
## data:  Rodent_Female$weight and Rodent_Male$weight
## t = -2.0226, df = 31751, p-value = 0.04312
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.62412287 -0.02552529
## sample estimates:
## mean of x mean of y 
##  42.17055  42.99538

Because the p-value (0.0431195) is less than the alpha of 0.05, we reject the null hypothesis that the mean weights of the two sexes are the same. In the output above, “mean of x” is the mean of females and “mean of y” is the mean of males.